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Abstract 

We show that the depth of quantum circuits in a realistic architecture where a classical controller 
determines which local interactions to apply on the fcD grid Z fe where k > 2 is the same (up to a constant 
factor) as in the standard model where arbitrary interactions are allowed. This allows minimum-depth 
circuits for the nearest-neighbor architecture to be obtained from minimum-depth circuits in the standard 
abstract model. Our work therefore justifies the standard assumption that interactions can be performed 
between arbitrary pairs of qubits. In particular, our results imply that Shor's algorithm, controlled 
operations and fan-outs can be implemented in constant depth, polynomial size and polynomial width 
in this architecture. 

We also present optimal non-adaptive quantum circuits for controlled operations and fan-outs on a 
fcD grid. These circuits have depth Q(tyn), size O(n) and width G(n). Our lower bound is for a general 
class of operations which includes controlled operations and fan-outs as special cases. 



1 Introduction 



Quantum algorithms are typically formulated at an abstract level and allow arbitrary one- and two-qubit 
interactions. However, in an actual implementation of a quantum computer, typically only local interactions 
between neighboring qubits are possible. This is the motivation for the fcD nearest-neighbor two-qubit 
concurrent (kD NTC) architecture [131 HI E] in which the qubits are arranged on the fcD grid Z fe as 
shown in Figure [la] for the case where fc = 2. Operations may involve one or two qubits with the restriction 
that two-qubit operations may only be performed along an edge in the grid. Multiple operations may be 



performed concurrently as long as they are on disjoint sets of qubits; an example is shown in Figure lb 

The idea of using a classical controller to determine which operations to apply at each step is implicit 
in the pre- and post-processing stages of Shor's algorithm [T^] and is necessary for fault-tolerant quantum 
computation. Since the classical controller can take intermediate measurement outcomes into account, this 
model includes the class of adaptive quantum circuits as a special case. It is potentially even more powerful 
since the classical controller can perform randomized polynomial-time computations to determine which 
operations to apply as well as perform pre- and post-processing. Since quantum operations are far more 
expensive than classical operations, we are primarily concerned with the depth of the quantum circuit and 
do not count the operations performed by the classical controller as long as they take polynomial time. 

In this work, we study the depth required to perform operations in both the classical-controller kD NTC 
(kD CCNTC) architecture — a classical controller model where interactions are restricted to a fcD grid — 
as well as the non-adaptive kD JVTCQ (NANTC) architecture where no classical controller is used and the 
operations applied cannot depend on intermediate measurement outcomes. The CCNTC model ignores the 
cost of offline computations performed by the classical controller and assumes that there are no classical 
locality restrictions. However, this is realistic since the clock rate for a classical computer is much faster 
than for a quantum computer. Because quantum computers are already forced to be parallel devices in order 
to perform operations fault tolerantly pQ, the total runtime of the quantum operations is proportional to 
the depth of the corresponding quantum circuit. The restriction that interactions are between neighboring 
nodes on a kD grid comes from the underlying physical device: in most technologies, only qubits that are 
spatially close can interact. 

We first compare the standard classical controller abstract concurren 

43 (CCAC) architecture to fcD CC- 
NTC. Our goal is to simulate CCAC in fcD CCNTC. We accomplish this using a 2D CCNTC teleportation 
scheme that allows arbitrary interactions on disjoint sets of qubits to be performed in constant depth. 

Theorem 1.1. Suppose that C is a CCAC quantum circuit with depth d, size s and width n. Then C can 
be simulated in 0(d) depth, O(sn) size and n 2 width in 2D CCNTC. 

This theorem shows that the standard assumption that quantum algorithms can perform arbitrary in- 
teractions is reasonable in 2D CCNTC. A corollary is that the depths required to implement any operation 
in CCAC and fcD CCNTC where fc > 2 are the same up to a constant factor. 

Corollary 1.2. Let £ be a quantum operation on n qubits. Let d\ and e?2 be the minimum depths required 
to implement £ with error at most e using poly(n) size and poly(n) width in the CCAC and fcD CCNTC 
model^ respectively where fc > 2. Then d\ = 0(g?2)- 

It is possible to implement Shor's algorithm [T2] in constant depth in CCAC [3] which implies that it can 
also be implemented in constant depth in 2D CCNTC. 

Corollary 1.3. Shor's algorithm can be implemented in constant depth, polynomial size and polynomial 
width in 2D CCNTC. 



1 The original NTC architecture described by Van Meter and Itoh [15] is in fact NANTC; however, we prefer NANTC to 
avoid confusion with CCNTC where a classical controller is used. 

2 This is the AC architecture of Van Meter and Itoh [15| augmented with a classical controller. Note that the AC architecture 
is not to be confused with the complexity class AC. 

3 Here, we assume that there is a minimum depth required to implement S in CCAC when the size and width are poly(rt). 
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Since controlled- U operations and fan-outs can also be performed in constant depth and polynomial 
width in CCAC [5] El HH] , we also have the following corollary. 

Corollary 1.4. Controlled-U operations with n controls and fan-outs with n targets can be implemented in 
constant depth, poly(n) size and poly(n) width in 2D CCNTC. 

Our main technical result allows any subset of qubits to be reordered in constant depth. Theorem |1.1| 
follows from this as a corollary. 

Theorem 1.5. Suppose we have an n x n grid where all qubits except those in the first column are in the 
state |0). Let T C {0, . . . ,n — 1} and let tt : T — > {0, . . . , n — 1} be an injection such that for all j £ T with 
^(i) — 0, {k £ T c | k < j} = 0. Set m = \{j £ T \ ^ 0}|. Then we can move each qubit at (0,j) to 
(""(i), 0) for all j £T in 0(1) depth, 0{mn) size and (m + l)n < n 2 width in 2D CCNTC. 

Implementations of Shor's algorithm in fcD CCNTC with various super-constant depths were previously 
known for k = 1 and k = 2. Fowler, Devitt and Hollenberg [6j showed a ID CCNTC circuit for Shor's 
algorithm which requires 0(n 3 ) depth, 0(n ) size and O(L) width where n is the number of bits in the 
integer which is being factored. Kutin [S] gave a more efficient ID CCNTC circuit which uses 0(n 2 ) depth, 
0(n 3 ) size and 0(n) width. For 2D CCNTC, Pham and Svore [TT] showed an implementation of Shor's 
algorithm in polylogarithmic depth, polynomial size and polynomial width. 

It was also previously known that controlled-?/ operations and fan-outs can be implemented in constant 
depth, polynomial size and polynomial width in CCAC. This line of work was started by Moore [10] who 
showed that parity and fan-out are equivalent and posed the question of whether fan-out has constant- 
depth circuits. Efoyer and Spalek [5] proved that if fan-out has constant-depth circuits then controlled-?/ 
operations can also be implemented in constant depth with inverse polynomial error. Browne, Kasheh and 
Predrix [3] showed that one-way quantum computation is equivalent to unitary quantum circuits with fan- 
out. A consequence of this is that constant depth adaptive circuits for fan-out can be used to implement 
controlled-?/ operations in constant depth in CCAC. Takahashi and Tani [13] reduced the size of this circuit 
by a polynomial and made it exact. 

In many technologies, measurements are much more costly than unitary operations. For this reason, we 
also consider the non-adaptive fcD NANTC model. Here, there is no classical controller and the operations 
applied depend only on the size of the input and not on intermediate measurement outcomes. Our result is 
a characterization of the complexity of controlled-!/ operations and fan-outs in fcD NANTC. 

Theorem 1.6. The depth required for controlled-U operations with n controls and fan-outs with n targets 
in fcD NANTC is 0(y / n). Moreover, this depth can be achieved with size O(n) and width O(rt). 

If the clock speeds of the quantum computer and its classical controller are comparable, then operations 
implemented using Theorem |1.6| are significantly faster than those implemented using Corollary |1.4| For this 
reason, Theorem 1 1 . 6 1 may become a better option as quantum computing technology matures. 

The layout of our paper is as follows. In Section [2] we discuss definitions used in the rest of the paper 
and define the models of computation precisely. In Section [3] we review quantum teleportation and describe 
teleportation chains. In Section[4] we describe our 2D teleportation scheme and show that it allows arbitrary 
interactions to be implemented in constant depth in 2D CCNTC. In Section [5] we show an algorithm that 
implements controlled-?/ operations and fan-outs for fcD NANTC in depth O(-tyn). In Section|6j we describe 
how our techniques can be applied to obtain fcD NANTC quantum circuits for fan-out with depth 0(y / n). In 
Section]?] we prove a matching lower bound for a class of operations which in particular contains controlled-?/ 
operations and fan-outs. 
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(a) Interactions in the 2D NTC ar- 
chitecture: the grid lines indicate 
the two-qubit interactions which 
can be performed 



(b) An example of concurrent in- 
teractions in the 2D NTC architec- 
ture: the components connected by 
the thick red edges indicate con- 
current interactions and the thick 
red circles indicate single-qubit in- 
teractions 



Figure 1: The 2D NTC architecture 



2 Definitions 

The one- and two-qubit operations that can be performed by the hardware are called the basic operations. 
The set of basic operations depends on the technology but we shall assume that it is a universal gate set 
which means we can construct any one- or two-qubit unitary from the basic operations. We also assume 
that the basic operations include the ability to perform measurements in the computational basis. 

It is useful to distinguish between physical and logical timesteps. During each physical timestep, we can 
perform any set of disjoint basic operations. During a logical timestep, we allow any set of disjoint i-qubit 
operations to be performed. In this work, we take t — 0(k) and assume fc is constant. 

Definition 2.1 (NANTC). In the fcD NANTC model, computation is performed by applying a sequence of 
sets of basic operations S\, . . . ,Sd to the fcD grid of qubits. We require that the operations in the set Si are 
disjoint and are either single-qubit operations or two-qubit operations between neighbors in the fcD grid. The 
sequence of sets of operations must be randomized polynomial-time computable from the size n of the input. 

In the models where a classical controller is present, the classical controller is invoked after each physical 
timestep to determine which operations to apply at the next step. 

Definition 2.2 (CCAC). Let M be a randomized polynomial-time machine that takes the input x and the 
measurement outcomes from the first i physical timesteps and outputs a set M\, . . . ,Mg of disjoint basic 
operations to be applied to the qubits at the i + 1 th physical timestep. If no more physical timesteps are to be 
performed, then M outputs a special symbol □. Computation in the CCAC model is performed at physical 
timestep i by using M to compute the set of operations to apply and then applying them to the qubits. 

The CCNTC model is similar except that it also requires that two-qubit operations are only performed 
between neighbors on the fcD grid. 

Definition 2.3 (CCNTC). Let M be a randomized polynomial-time machine that takes the input x and 
the measurement outcomes from the first i physical timesteps and outputs a set Mi , . . . , Mg of disjoint basic 
operations to be applied to the fcD grid of qubits at the i + 1 th physical timestep. We require that each Mi is 
either a single-qubit operation or a two-qubit operation between neighbors in the fcD grid. If no more physical 
timesteps are to be performed, then M outputs a special symbol □. Computation in the CCNTC model is 



3 



performed at physical timestep i by using M to compute the set of operations to apply and then applying 
them to the fcD grid of qubits. 



In this paper, the machine M from Definitions |2.2| and |2.3| will typically be deterministic except for the 
pre- and post-processing stages of Shor's algorithm. 

For NANTC, a quantum circuit is the sequence of basic operations M\, . . . , Mg be applied to the fcD grid 
of qubits. This sequence of operations only works for one input size n so we consider families of quantum 
circuits. For the CCAC and CCNTC models, a quantum circuit is described by the machine M from 
Definitions I2J2] and 12^31 We now define three standard measures of cost in these models. 



Definition 2.4. The depth of a quantum circuit is 



a) d for NANTC where Si, . . . , Sd is the sequence of operations from Definition 2.1 for an input of size n 

b) max r max j£ {o i}n D x for CCAC and CCNTC where D x is a random variable denoting the number of 
physical timesteps it takes for the machine M from Definitions\2. 2\ and\2.3\ to output □. The first max 



is taken over all possible random seeds r and the second is over all possible inputs x of length n 

We note that the depth only changes by a constant factor if we use logical timesteps instead of physical 
timesteps in the above definition. This is due to our assumption that any operation performed in a logical 
timestep acts on at most 0(k) = 0(1) qubits. 

Definition 2.5. The size of a quantum circuit is 

a) | Si | for NANTC where S\,. . . ,Sd is the sequence of operations from Definition 2.1 for an input of 



size n 

b) max r max I g{o jl }n S x for CCAC and CCNTC where S x is a random variable denoting the total number 
of operations applied when the input is x. The first max is taken over all possible random seeds r and 
the second is over all possible inputs x of length n 

In the next definition, we assume that the qubits are indexed by N for CCAC. 

Definition 2.6. The width of a quantum circuit is 

a) the total number of qubits acted on by operations in the sets Si for NANTC where S±, . . . , Sd is the 
sequence of operations from Definition \2.1\ for an input of size n 

b) max l6 {o ^» \A X \ for CCAC where A x is the smallest subset of N such that every qubit acted on is 
contained in A x for input x and all random seeds r 

c) maxjgjo !)- |j4 k | for CCNTC where A x is the smallest hyperrectangle in 1 k such that every qubit acted 
on is contained in A x for input x and all random seeds r 

Typically, the depth is the most important metric to optimize since it is proportional to the amount of 
time required to execute the quantum operations. The width is also important since the number of qubits is 
currently quite limited but the size is largely irrelevant. Moreover, if parallelism is properly exploited then 
we expect the size to be roughly the depth times the width. 



3 Quantum teleportation 

In this section we review quantum teleportation [H[7]. As we shall see, teleportation is a useful primitive 

that allows non-local interactions to be performed in a constant-depth circuit in fcD CCNTC. Suppose Alice 

s s s 

has a state = a |0) + |1) that she wishes to send to Bob. The two parties are not allowed to send 

quantum states to each other but each have one qubit of an EPR pair l^oo) = — — 2 — — anc ^ are 
allowed to communicate classically. 

The Bell basis consists of the states |$„o) = |00> + |n> , |$oi) = |01> + |10> , |*io) = m ^ u) and |$ u ) = 
l 01 )~l 10 ) . By applying a Hadamard followed by a CNOT, one can transform a computational basis state 

\xix 2 ) to \$ XlX2 ). The Pauli matrices are a = I = ( ^ ^j,ai=X=(^ ^ \ a 2 = Y = ( ® * J 
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and (T3 = Z = f ^ 1 / ^ 6 a ^ S ° n °^ e l^ 1 * 2 ) = -^^Z^ 1 l^oo) where the subscripts denote which 
qubit each operation is applied to. 

To send \ip) to Bob, Alice starts by measuring the SA registers in the Bell basis. If the measurement 
outcome is |$ a;i a 2 ) = l ^ 2 ^ -1 ) M^^ei) ^en (neglecting normalization at intermediate stages) we have 



(^ x f A W S \*oo) AB = ((0,x 2 \ SA + (-I)* 1 (M2 © l\ SA )(a\0) S + /3\lf) |$ 00 ) AB (1) 

= (a (x 2 \ A + (—l) Xl P (x2 © l\ A ) \t>oo) AB (2) 

= a\x 2 f + {-\rP\x 2 ®l) B (3) 

= X X2 Z X1 |V>) S (4) 

We have the identity Y = iXZ, so we see that after Alice performs her measurement, Bob's state is 
<7fc \tp) for some k up to global phase. Since is determined by Alice's measurement outcome |$ xia . a ), sne 
can send k to Bob over a classical channel using two bits. Bob can then obtain the original state \ip) by 
applying <7fc. The registers SA are left in the state \& XlX2 ). 

AB r AB 

Let us write each Bell state can be written as |$^) = erf |$oo) U P to global phase. Suppose that 
Alice and Bob started sharing the state \<&i) AB instead of \§q) AB . Performing a Bell measurement on the 

S AB S AB 

SA registers of |$^) is equivalent to measuring SA on \ip) |$q) an d then applying ae to B. Then 
if the measurement outcome is Bob's state is at<Jk \^) which is a m \ip) for some m up to global phase. 

Bob can then obtain the state \4>) as before once Alice sends him m. 

Let now consider how quantum teleportation chains can be used in the ID CCNTC model to perform 
non-local operations in constant depth. Suppose that we have a qubit in the state \ip) along with m Bell 
states | $^ ) 3 J . These are arranged on a line so the overall state is 



,A t Bj 



(5) 



Our goal is to move qubit S to B m . One way to do this is to first teleport S to B\ by performing a Bell 
measurement on A\B\. We then store the measurement outcome k\ but do not apply the correcting Pauli 
operation; at this point, the state of B\ is a£ 1 ak 1 \ip). Continuing this process, we obtain the state 



0n> n K**i)w Bm ( 6 ) 

j=l j=m 

Since ]^[ -_ m (ptjVkj) IS J us t a Pauli operation, we obtain the state 



m 

®|** 1 >M flra (7) 
i=i 

in a single operation. The crucial point here is that all of the Bell measurements are performed on disjoint 
pairs of qubits so they can all be done in parallel as in |14) . Thus, we can perform a non-local interaction 
of arbitrary distance in constant depth. It is important to note that this is not possible without a classical 
controller since otherwise there is no way to compute the correcting Pauli operation. 
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4 Depth complexity in kD CCNTC 



In this section, we show that an arbitrary set of CCAC interactions corresponding to basic operations can 
performed in constant depth in 2D CCNTC. We assume that there are n qubits on which the interactions 
are to be performed and store these in the first column of a 2D n x n CCNTC grid. The qubit at location 
is denoted by qtj. Since we must handle interactions between qubits that are not neighbors, we may as 
well assume that the original n qubits are stored in the first column go,o> • • • > Qo,n— l of qubits. The remaining 
columns are used as ancillas to implement teleportation chains. We teleport each of the n qubits horizontally 
to the right so that interacting pairs are in adjacent columns. Since these teleportations are on disjoint sets of 
qubits, they can be performed in parallel. A second set of vertical teleportation chains is then used to move 
all the qubits down to the first row. At this point, the interacting qubits are neighbors so the interactions 
may be implemented directly. We then perform the reverse teleportations to move the qubits back to their 
original positions. 



4.1 An example of arbitrary interactions in 2D CCNTC 

We show an example in Figure [2j The desired interactions are shown in Figure 2a The layout of the data 
qubits in the 2D grid is shown in Figure 2b the ancilla qubits are used to implement the teleportation chains 
and are initially set to |0). We start by horizontally teleporting the qubits that interact to adjacent columns 
in Figure [2c] where the teleportation chains are denoted by the dotted red arrows. The red double arrow 
indicates a swap operation which is a less expensive way of achieving the same result when the qubits are 



neighbors. The next step is to vertically teleport the data qubits down to the first row as shown in Figure 2d 
Finally, all interacting qubits are now neighbors so we perform the desired interactions in Figure [2eJ The 
final reverse teleportations are not shown but can be obtained by reversing the arrows in Figures |2c| and |2d| 





(a) (b) (c) 

Figure 2: Performing an arbitrary set of interactions in 2D CCNTC. The qubits crosshatched green are the 
data qubits and the qubits shaded with diagonal downward blue lines are ancilla qubits. 



4.2 An algorithm for performing arbitrary interactions in 2D CCNTC 

In order to define our algorithm, we first show how to perform an arbitrary reordering of the positions of the 
qubits in constant depth. We assume that there are n data qubits which are located in the first column of 



G 



(d) (e) 

Figure 2: Performing an arbitrary set of interactions in 2D CCNTC 

the n x n grid; the remaining qubits are in the state |0). We let T C {0, . . . ,n — 1} be a subset of row indexes 
on which an injection 7r : T — > {0, . . . , n — 1} is to be applied. This injection describes where the qubits with 
row indexes in T are to be moved to on the x-axis. The reason we specify T explicitly is because this allows 
us to only perform teleportations on qubits which have row indexes in T. If |T| = o(n) then this can result 
in a circuit that has asymptotically smaller size. The reordering can be applied using Algorithm [T] which is 
based on the same technique as Figure[2j The notation teleport(g , i 1 i3l , qi 2 .j 2 ) where i\ = 12 or ji — j2 means 
that a teleportation chain is applied to move the state of qubit at along the line to («2, ia)- 



Algorithm 1 The algorithm for performing an arbitrary reordering of a subset of the qubits in 2D CCNTC 
Require: The n data qubits are in the first column, T C {0, . . . , n — 1} and n : T — > {0, . . . , n — 1} is an 

injection. For all jeT such that = 0, {k € T c \ k < j} = 
Ensure: Each qubit at (0, j) is moved to 0) for all j 6 T 

1: function Reorder(T, it) 

2: for j £ T do 

3: teleport(g j,g 7 r(j)j) 

4: end for 

5: for j e T do 

6: teleport((7 7r ( j)ii ,g 7r ( i)i0 ) 

7: end for 

8: end function 



Our main technical result follows immediately from Algorithm [T] 

Theorem 1.5. Suppose we have an n x n grid where all qubits except those in the first column are in the 
state |0) . Let T C {0, . . . , n — 1} and let n : T — » {0, . . . , n — 1} be an injection such that for all j £ T with 
^(i) — 0, {k £ T c I k < j} — 0. Set m = \{j £ T \ ir(j) 7^ 0}|. Then we can move each qubit at (0,j) to 
(""(i), 0) for all j e T in 0(1) depth, 0{mn) size and (m + l)n < n 2 width in 2D CCNTC. 

It is now straightforward to describe the algorithm for performing arbitrary interactions. We first note 
that an arbitrary set of interactions can be defined by disjoint one and two element subsets of {0, . . . , n— 1} 
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and basic operations Mk where 1 < k < £ and the values in Jk denote the qubits on which the operation Mk is 
to be applied. The pseudocode for performing arbitrary interactions in 2D CCNTC is shown in Algorithmic] 



Algorithm 2 The algorithm for performing arbitrary interactions in 2D CCNTC 

Require: The n data qubits are in the first column, each Jk is a disjoint one or two element subset of 

{0, . . . , n — 1} and Mb is a basic operation for 1 < k < £. Moreover, \ J^ 1 \ < \Jk 2 | for k\ < k 2 
Ensure: The interactions specified by Jk and Mk are applied 

l: function Interact^ J 1; ...,Jt, Mi, ... , M e ) 

2: T := () 

3: i:=0 

4: for k:=l,...,£do 
5: if | Jfc| = 1 then 

6: i := 1 

7: else 

8: {31J2} ■= Jk where ji < j 2 

9: 7r(j'i) := i 

10: n(j 2 ) :=i + l 

li: Append the elements of Jk to T 

12: i := i + 2 

13: end if 

14: end for 
15: Reorder(T, n) 
16: i := 

17: for k := 1, . . . ,£ do 
18: if I Jk\ = 1 then 

19: {j} := J fc 

20: Apply M fc to g ,j 

21: j := 1 

22: else 

23: Apply M k to q i>0 , q l+ i,o 

24: i := i + 2 

25: end if 

26: end for 

27: Perform the reverse teleportations to move the qubits back to their original positions 
28: end function 



The following theorem is a direct consequence of Algorithm [2] 

Theorem 1.1. Suppose that C is a CCAC quantum circuit with depth d, size s and width n. Then C can 
be simulated in 0(d) depth, O(sn) size and n 2 width in 2D CCNTC. 

The rest of our results for fcD CCNTC follow from Theorem |1.1| Let T> n denote the set of all n x n 
density matrices. A general quantum operation is represented as a completely positive trace preserving 
(CPTP) map 8 : T> n — > T> n . Obviously, any circuit in the 2D CCNTC model can also be applied when 
arbitrary interactions are allowed. The following corollary is immediate. 

Corollary 1.2. Let E : "D n — > T> n be a CPTP map and let e > 0. Let d\ and d 2 be the minimum depths 
required to implement £ with error at most e in the CCAC and fcD CCNTC models respectively where fc > 2. 
Then d\ = Q(d 2 ). 

It is known that Shor's algorithm can be implemented in constant depth, polynomial size and polynomial 
width in CCAC [3] from which we obtain another corollary. 
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Corollary 1.3. Shor's algorithm can be implemented in constant depth, polynomial size and polynomial 
width in 2D CCNTC. 

Because controlled-?/ operations and fan-outs with unbounded numbers of control qubits or targets can 
be performed in constant depth, polynomial size and polynomial width in CCAC [51 [31 [T3J , we have the 
following result. 

Corollary 1.4. Controlled-U operations with n controls and fan-outs with n targets can be implemented in 
constant depth, poly(n) size and poly(n) width in 2D CCNTC. 



5 Controlled operations in kD NANTC 

In this section, we show how to control a single-qubit U operation by n controls using O(tyn) operations in 
kD NANTC. We start with an to x m grid; for reasons that will become clear later, we require that to is odd. 
The control qubits are placed such that they are not at adjacent grid points; the central 3x3 square has 
no controls except when to = 3. This is illustrated in Figures [3a[ [4a] [5a| and [6a| for the cases where to = 3, 
to = 5, to = 7 and m = 9. Let c be the center of the grid which corresponds to the target qubit. The circuit 
works by considering each square ring in the grid with center c (i.e., a set of points in the grid that all have 
the same distance to the center under the ioo norm IK^y)!!™ = max{|a;| , |y|}. We start with the outermost 
such ring and propagate its control values into the next ring. At each such step, some of the control values 
are combined so that all the values can fit into the smaller ring. This continues until we reach a 3 x 3 ring 
at which point we apply a special sequence of operations to finish applying the controlled operation to the 
central qubit. We will show that each stage can be implemented in constant depth, so the overall depth is 
OWn). 



5.1 The base case: the 3x3 grid 

We now describe how this circuit works in greater detail. First, consider the case where to = 3. The grid 
starts as shown in Figure [3a] note that we do not force the central 3 x 3 to be devoid of controls in this case 
since this is the entire grid. All ancilla qubits start in the state |0). We start by setting the lower left and 



upper right corner ancilla qubits to the ANDs of their neighboring controls as shown in Figure 3b Both of 
these operations are disjoint, so this can be done in one logical timestep. The next step is to swap these two 
corner qubits with the vertical middle qubits so they can interact with the central target qubit; this is done 
in Figure 3c] Finally, we apply a U operation to the target qubit and control by the two middle qubits in 
Figure [3d 

At this point, the target qubit has the desired value; however, there are two other ancilla qubits in 
Figure [3d] that must have their values uncomputed. This is done by applying the operations of Figures |t]p"] |c] 
in reverse order. 



(a) (b) (c) (d) 

Figure 3: A controlled operation on a 3 x 3 grid. The qubits crosshatched green are the data qubits, the 
qubits shaded with diagonal upward orange lines are ancilla qubits which store intermediate data and the 
qubits shaded with diagonal downward blue lines are ancilla qubits which are currently unused. 
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5.2 An example of the general case: the 5x5 grid 

We now consider an example of the general case where to = 5 as shown in Figure |4a[ The first step is to 
propagate the values of the outer ring inwards; since the inner ring is 3 x 3, there are no controls in the inner 
ring so this can be done as shown in Figure [4b] We then rotate the inner ring as in Figure [4cj At this point, 
the remaining operations to perform are the same as in the 3x3 case and are shown in Figures |^|i{Jf| At 
this point the target qubit has the desired value so we uncompute the intermediate ancillas by applying the 
operations of Figures ^ c]in reverse order. 



The same idea applies to an m X m grid except that when the inner rings have controls (i.e. for m > 7), 
the controls from the outer ring must be combined with those in the inner ring at the same time they are 
propagated inwards. The basic idea is the same as for the 5x5 case. See Appendix [X] for examples of the 
7x7 and 9x9 cases. 



5.3 An algorithm for controlled-?/ operations in O(^fn) depth in 2D NANTC 

We now present the algorithm used in Figures [3] - [6] for the general m x m grid. Let us now consider an odd 
to > 3. We denote the coordinates of the qubits on this grid by (x, y) where < x, y < to. Let G be the set 
{0, . . . , to — l} 2 of all points on the grid and let c = ((to — l)/2, (to — 1) /2) be the central point. As discussed 
previously, the geometry induced by the norm is useful for reasoning about this grid. From now on, all 
distances in this subsection are understood to be with respect to the norm. 

We will say that the k th ring is the set of points that have distance (m — l)/2 — k to c so the zeroth ring 
is outermost; we denote by R k — (r§, . . . , ) the points of the k th ring where r§ is the bottom left corner 
and the rest of the points are in clockwise order. 

The ring R k contains 4 (^f^ 1 — k) controls so the entire grid has n = 4^ 3<m 2/c<m (^f^ — k) = 
(1/2)(to 2 — 9/2) controls for m > 3. In the case where, m = 3, there are 4 controls. Thus, it is indeed the 
case that the depth is 0(y/n). 

We denote by qij the value stored at the point and assume the operation to apply to the target 

is U. The notation CU(j/, xi, . . . ,xi) denotes applying a controlled-?/ operation to qubit y conditional on 
X\, . . . , X£. To apply a swap operation to qubits x and y, we write swap(x, y). The pseudocode for the main 
algorithm is shown in Algorithm [3j the auxiliary functions are shown in Algorithm [4] 
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Algorithm 3 The algorithm for implementing a controlled- U operation on an to x m grid 



Require: m is odd 

Ensure: A controlled-?/ operation is applied to the target 
l: function Control (to) 
2: fc:=0 

3: while to — 2k > 3 do 
4: Control-Stage(fc) 
5: k:=k + l 

6: end while 

7: Uncompute the intermediate ancillas by repeating all operations except for the final CU operation in 

reverse order 
8: end function 

9: function CONTROL-STAGE(fc) > k is the depth in the recursive call; at the top level, k = 

10: if k > then 
11: Control-Clockwise(fc) 
12: Rotate(fc) 
13: end if 

14: if to — 2k = 3 then > In this case, we have a 3 x 3 grid 

15: qk,k <lk,k © Qk.k+1 A Qk+l,k 

16: qk+2,k+2 9fe+2,fe+2 © <?fc+l,fc+2 A qk+2,k+l 

17: SWap(<7 fei fe,<7fc,fc+l) 

18: SWap(g fe+ 2,fe+l,Q'fc + 2,fc+2) 

19: C\J(qk+l,k+l,Qk,k+l,Qk+2,k+l) 

20: end if 
21: end function 



Algorithm 4 The ROTATION and CONTROL-CLOCKWISE operations 

function CONTROL-CLOCKWISE(fc) 

C = ((k, k), (k, m — k — 1), (m — k — 1, m — k — 1), (m — k — 1, k)) > The corners of Rk 

D = ((0, 1), (1,0), (0, —1), (—1,0)) > The directions to follow between the corners of Rk 

for i := 0, . . . , 3 do 
z_ := i — 1 mod 4 
z + := i + 1 mod 4 

Qd © QCi-Di A 9Ci+-D,_ > Compute the corner ancilla 

Let so, • • ■ ) s e k /4 be the points in Rk from d to Ci + excluding C i+ 
j ==2 

while j < ik/4: — 1 do > Store the AND of two values in each ancilla in L except for the last 
QLj <- qL 3 © <7l 3 -d, A qL 3 +D t _ 
j ■= J + 2 
end while 

P '■= L t k /i-i 

if m — 2k > 3 then > For the last ancilla, use three controls unless we have a 5 x 5 grid 

q P <~ q P A q p -D % A q P +Di_ A q p+ D t 
else 

q p ^- q p A q p -D t A gp+u 4 _ 
end if 
end for 
end function 
function Rotate(/c) 
i := 1 

while i < Ik do 

i + := i + 1 mod ^ 
swap(g r( = , q r k ) 

i := i + 2 
end while 
end function 



The following theorem is an immediate consequence of Algorithm [3] 

Theorem 5.1. Controlled-U operations with n controls have depth 0(y/n), size 0(n) and width 0(n) in 2D 
NANTC. 

5.4 Generalization to kD NANTC 

In this section, we discuss how the circuit can be generalized to k dimensions. The algorithm works in the 
same way except the ring is replaced by the grid points on the surface of the hypercube formed by the 
points at distance (m — l)/2 — k from the center c of the grid. We proceed as before and propagate the 
controls on Rk into Rk+i until we obtain a grid of width 3. Since the number of controls on a fcD grid of 
length m is 0(m k ), we obtain a circuit of depth 0{y/n) for implementing a controlled-?/ operation with n 
controls. The constant depends on k, but we assumed that k is constant in Section [2] From this, we obtain 
the following result. 

Theorem 5.2. Controlled-U operations with n controls have depth 0{yfn), size 0(n) and width 0(n) in 
kD NANTC. 
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(a) 



(b) 




6 Fan-out operations 



In this section, we describe quantum circuits for fan-out. In this case, we have a single control qubit and 
our goal is to XOR it into each of the target qubits. The construction of fan-out circuits is adapted from 
Algorithm|3] the circuits are the same except that the qubit that was the target becomes the control qubit and 
qubits that were the controls become the targets. Let n be the number of targets. In the case of the circuit of 
Section [5j we simply apply all operations in reverse order and replace each Toffoli gate y <— y © x\ A . . . A x n 
with a fan-out operation Xj <— Xj ® y for all 1 < j < n. This yields a fcD NANTC fan-out circuit of depth 
0(\/n). We have shown the following. 

Theorem 6.1. Fan-outs to n targets have depth O(tyn), size 0(n) and width 0(n) in fcD NANTC. 

7 Optimality 

In this section, we prove that the depth, size and width of the circuits generated by Algorithm [3] (and its fcD 
generalization) are optimal for NANTC. A similar lower bound for addition is discussed in [5]. These lower 
bounds hold regardless of where the controls and target qubits are located on the fcD grid. They also hold 
for a more general class of operations that contains the controlled- U operations and fan-outs. 

We first note that each qubit is acted on by a constant number of operations in Algorithm [3j This 
implies that the size of the circuit is 0(n). This is clearly optimal (even for adaptive circuits with arbitrary 
interactions) since any circuit that implements a controlled operation must act on each of the controls. 

Theorem 7.1. Any NANTC quantum circuit that implements a non-trivial controlled-U operation with n 
controls has size Q(n). 

Consider a nonempty set S C Z fe of f2(n) points. Using a simple combinatorial argument, it is easy to 
prove the following: 

Lemma 7.2. For every y G S, \\x — 2/ 1 1 x = ^(V™) f or some x € S. 

The trace norm of a density matrix p (denoted ||p|| tr ) is equal to (l/2)tr|p| (the (1/2) factor ensures 
that o~\\ 1 is the probability of distinguishing p and a with the best possible measurement). Consider a 
general quantum operation £ : T> n — ► T> n represented as a CPTP map. We will use an operator version of 
the trace norm defined by ||£|| tr = sup p6l , ||£ (p)\\i, if £\ and £2 are two CPTP maps then ||£i — £ 2 || tr is the 
probability of distinguishing between them on the worst possible input. Thus, it is a measure of how much 
these operations differ. We will also make use of the partial trace. If a; is a qubit, then we will denote the 
partial trace over all qubits except x by tr^ = tr Z M r x i. 

Using this notation, controlled-?/ operations are special case of a more general class of operations. 

Definition 7.3. Let £ : T> n T> n be a CPTP map. We say that £ is e-input sensitive if there exists a 
qubit y such that for Q(n) qubits x, there exists a CPTP map T : T> n — > T> n acting only on x such that 
\\tr^ y (£T - £)\\ ti >e. 

Intuitively, an e-input sensitive operation is a generalization of a Toffoli gate where modifying some input 
qubit x yields a different value on the output with probability e. Similarly, we can define e-output sensitive 
operations which can be thought of as generalizations of fan-out. 

Definition 7.4. Let £ : T> n — > T> n be a CPTP map. We say that £ is e-output sensitive if there exists a 
qubit x such that for f2(n) qubits y, there exists a CPTP map J- : T> n — > T> n acting only on x such that 
||tr^(£J--£)|| tr >e. 

We say that £ is e-sensitive if it is e-input or e-output sensitive. A family {£ : T> n — > T>„} of CPTP maps 
is e-sensitive if every £ n is e-sensitive (however, note that £ n might be either input or output depending on 
the value of n). Our lower bounds will apply to all families of e-sensitive operations. All proofs will be for 
the case of e-input sensitive operations but the argument of e-output sensitive operations is all but identical. 
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Theorem 7.5. Let {£ n : T> n — » T> n } be a family of e- sensitive operations. Then any family of kD NANTC 
circuits {C n } such that \\£ n — C„|j tr < e/2 for all n has size Q(n). 

Proof. Suppose that C n has size o(n). Assume £ n is e-input sensitive and choose a qubit y as in definition 



Definition 7.3 (the case where it is e-output sensitive is very similar). There are f2(n) qubits x such that 

there exists a CPTP map J- : T> n — > T> n acting only on x such that ||tr^ y (£„.F — £ n )|| tr > e. For large n, 
there is such an x which is not acted on by C n . Then tr^ y C„.F = tr^ y C' n . Now 

||tr^(C„ - £„)|| tr = ||tr^(C n J- - £ n )\\ ti (8) 

> \ \\tr^ v (C n F - £ n F)\\ tI - \\ti^ y {£ n F - £ n )\\ ti .\ (9) 

> e/2 (10) 

which is a contradiction. □ 
We call a controlled-C/ operation non-trivial if U ^ I. It is easy to prove the following. 
Lemma 7.6. Non-trivial controlled-U operations and fan-outs are 1-sensitive. 



From this we obtain the following corollary of Theorem 7.5 of which Theorem 7.1 is a special case. 



Corollary 7.7. Let {£ n : T> n — » T> n } denote a family of controlled-U operations or fan- outs. Any family of 
kD NANTC circuits {C n } such that ||C„ — £n.|| tr < 1/2 has size Q(n). 

This shows that Algorithm [3] (and its fcD generalization) have optimal size. We now show that e-sensitive 
kD NTC circuits have depth Q{^/n). 

Theorem 7.8. Let {£„ : T> n — > T> n } be a family of e-sensitive operations. Then any family of kD NANTC 
circuits {C n } such that \\£ n — C„|j tr < e/2 for all n has depth fl(^/n). 

Proof. Suppose {C n } has depth o{y/n). Assume that £ n is e-input sensitive (the case where it is e-output 



sensitive is very similar) and choose a qubit y as in Definition 7.3 There are f2(n) qubits x such that there 



exists a CPTP map T : V n — > V n acting only on x such that ||tr^ y (f„J r — £n)|| tr > e. Let c > be the 



hidden constant in Lemma 7.2 when applied to the set of all such x. For sufficiently large n, the depth of 
C n is strictly less than c^fn. Let Gi be the set of disjoint one- and two-qubit operations that are performed 
at timestep 1 < i < T in C n . For an operation M € Gi, let us say that M is active if 

a) M acts non-trivially on y or 

b) there is an operation M' € Gj with i < j < T such that M' is active and M and M' act non-trivially 
on a common qubit 

Let us say that a qubit x influences y if there exists an active operation M € Gi that acts non-trivially 
on x. Suppose x influences y after T timesteps. Because all operations act on pairs of adjacent qubits, the 



?i distance between x and y is at most T. Let us choose x such that \\x — y\\ x > c^fn as in Lemma 7.2 



Because T is equal to the depth, T < c^fn so x does not influence y. Choosing a J- acting only on x as in 



Definition 7.3 we have 



||tr^(C„ - £ n )|| tr = ||tr^(.FC n - £„)|| tr (11) 

> \ \^ y {C n T-£ n T)\ r -\\tt^{£ n T-£ n )\ T \ (12) 

> e/2 (13) 

which is a contradiction. □ 
By Lemma |7.6[ we obtain the following corollary. 
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Corollary 7.9. Let {£ n : T> n — » T> n } denote a family of controlled-!! operations or fan- outs. Any family of 
kD NANTC circuits {C n } such that ||C„ — £n|| tr < /2 /ias depth at least VL(tyn). 

From Theorems |5.2| and 6A and Corollaries |7.7| and 7J5 we conclude that Algorithm [3] and its /cD 
generalization are optimal in their depth, size and width. 

Theorem 1.6. The depth required for controlled-U operations with n controls and fan-outs with n targets 
in kD NANTC is 0({/n). Moreover, this depth can be achieved with size <d(n) and width 0(n). 
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A More Examples 

We now present the implementation of controlled- operations in 7 x 7 and 9x9 2D NANTC grids. This is 
shown for m — 7 in Figure [5| As before, it is necessary to uncompute the intermediate ancillas by applying 
the operations of Figures |^bf {g| in reverse order. We also show the case where m = 9 in Figure [6j In this 
case, we apply the operations of Figures in reverse order to uncompute the intermediate ancillas. 
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(f) 




(i) (j) 
Figure 6: A controlled operation on a 9 x 9 grid 
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